Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Optimize] Improve performance like/not like filter through pushdown function to storage engine #10355

Merged
merged 10 commits into from
Jul 19, 2022

Conversation

compasses
Copy link
Contributor

Proposed changes

Issue Number: close #xxx

Problem Summary:

Describe the overview of changes.
In order to improve to improve the performance of like/not like string matching, this PR would pushdown the function to storage engine. Test shows it can get 2x-3x performance gain.

select sum(lo_quantity),sum(lo_extendedprice),lo_orderdate from lineorder where lo_orderpriority like '%MED%' group by lo_orderdate order by lo_orderdate;

vectorized: 
before: ~0.6s,  after: ~0.3s

not vectorized:
before: ~3s,  after: ~1.2s

data filtered in segment reader:
![image](https://user-images.githubusercontent.com/10161171/175207195-b01aa536-f346-42dd-99c5-db395cb78f73.png)


Checklist(Required)

  1. Does it affect the original behavior: (No)
  2. Has unit tests been added: (No Need)
  3. Has document been added or modified: (No Need)
  4. Does it need to update dependencies: (No)
  5. Are there any changes that cannot be rolled back: (No)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@compasses
Copy link
Contributor Author

compasses commented Jun 25, 2022

@mrhhsg Hi, can you help check the changes as below, involved from your commits recently, which will lead to the function pushdown not work in this PR in vectorized mode.

BTW, the pushdown function of like / not like can get 2x performance gain in vectorized execution engine.

bool SegmentIterator::_can_evaluated_by_vectorized(ColumnPredicate* predicate) {
    auto cid = predicate->column_id();
    FieldType field_type = _schema.column(cid)->type();
    switch (predicate->type()) {
    case PredicateType::EQ:
    case PredicateType::NE:
    case PredicateType::LE:
    case PredicateType::LT:
    case PredicateType::GE:
    case PredicateType::GT: {
        if (field_type == OLAP_FIELD_TYPE_VARCHAR || field_type == OLAP_FIELD_TYPE_CHAR ||
            field_type == OLAP_FIELD_TYPE_STRING) {
            return config::enable_low_cardinality_optimize &&
                   _column_iterators[cid]->is_all_dict_encoding();
        } else if (field_type == OLAP_FIELD_TYPE_DECIMAL) {
            return false;
        }
        return true;
    }
    default:
        return false;
    }
}

@compasses compasses changed the title Improve performance like/not like filter through pushdown function to storage engine [Optimize] Improve performance like/not like filter through pushdown function to storage engine Jun 25, 2022
@mrhhsg
Copy link
Member

mrhhsg commented Jun 25, 2022

@mrhhsg Hi, can you help check the changes as below, involved from your commits recently, which will lead to the function pushdown not work in this PR in vectorized mode.

BTW, the pushdown function of like / not like can get 2x performance gain in vectorized execution engine.

bool SegmentIterator::_can_evaluated_by_vectorized(ColumnPredicate* predicate) {
    auto cid = predicate->column_id();
    FieldType field_type = _schema.column(cid)->type();
    switch (predicate->type()) {
    case PredicateType::EQ:
    case PredicateType::NE:
    case PredicateType::LE:
    case PredicateType::LT:
    case PredicateType::GE:
    case PredicateType::GT: {
        if (field_type == OLAP_FIELD_TYPE_VARCHAR || field_type == OLAP_FIELD_TYPE_CHAR ||
            field_type == OLAP_FIELD_TYPE_STRING) {
            return config::enable_low_cardinality_optimize &&
                   _column_iterators[cid]->is_all_dict_encoding();
        } else if (field_type == OLAP_FIELD_TYPE_DECIMAL) {
            return false;
        }
        return true;
    }
    default:
        return false;
    }
}

@compasses I am not sure, this logic should be just the same as the previous.

@compasses
Copy link
Contributor Author

Ok, before the like predicate goes to the _short_cir_eval_predicate and now it goes to _pre_eval_block_predicate , I just make the like predicate support both way.

@Gabriel39 Hi could you help review this PR, and hope it can be merged ASAP. Cause I concern it will lead to conflict to other PR, and I need keep merging to fix them :).

Gabriel39
Gabriel39 previously approved these changes Jul 5, 2022
Copy link
Contributor

@Gabriel39 Gabriel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Jul 5, 2022

PR approved by anyone and no changes requested.

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit f6cb7a8 into apache:master Jul 19, 2022
miswujian pushed a commit to miswujian/doris that referenced this pull request Jul 28, 2022
…function to storage engine (apache#10355)

* support like/not like conjuncts push down to storage engine
* vectorized engine support like/not like conjuncts push down to storage engine
* support both evaluate and evaluate_vec method in like predicate
* reuse remove_pushed_conjuncts and prevent logic error during move function conjuncts
* change #ifndef to pragma once as per comments
* change enable_function_pushdown default to false
Co-authored-by: heguangnan <heguangnan@bytedance.com>
whutpencil pushed a commit to whutpencil/incubator-doris that referenced this pull request Jul 29, 2022
…function to storage engine (apache#10355)

* support like/not like conjuncts push down to storage engine
* vectorized engine support like/not like conjuncts push down to storage engine
* support both evaluate and evaluate_vec method in like predicate
* reuse remove_pushed_conjuncts and prevent logic error during move function conjuncts
* change #ifndef to pragma once as per comments
* change enable_function_pushdown default to false
Co-authored-by: heguangnan <heguangnan@bytedance.com>
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Aug 1, 2022
…function to storage engine (apache#10355)

* support like/not like conjuncts push down to storage engine
* vectorized engine support like/not like conjuncts push down to storage engine
* support both evaluate and evaluate_vec method in like predicate
* reuse remove_pushed_conjuncts and prevent logic error during move function conjuncts
* change #ifndef to pragma once as per comments
* change enable_function_pushdown default to false
Co-authored-by: heguangnan <heguangnan@bytedance.com>
yiguolei pushed a commit that referenced this pull request Feb 27, 2023
function pushdown: #10355
NGram BloomFilter Index apply like pushdown: #11579

Enabled by default, make sure it stays active.

If NGram BloomFilter Index is not used, this like pushdown can be replaced by #15917, which can push down all expressions including like.
Yulei-Yang pushed a commit to Yulei-Yang/doris that referenced this pull request Mar 5, 2023
function pushdown: apache#10355
NGram BloomFilter Index apply like pushdown: apache#11579

Enabled by default, make sure it stays active.

If NGram BloomFilter Index is not used, this like pushdown can be replaced by apache#15917, which can push down all expressions including like.
yagagagaga pushed a commit to yagagagaga/doris that referenced this pull request Mar 9, 2023
function pushdown: apache#10355
NGram BloomFilter Index apply like pushdown: apache#11579

Enabled by default, make sure it stays active.

If NGram BloomFilter Index is not used, this like pushdown can be replaced by apache#15917, which can push down all expressions including like.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants